AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsMar-18-2026, 07:18:13 GMT

Spiking Transformer with Experts Mixture

Spiking Neural Networks (SNNs) provide a sparse spike-driven mechanism which is believed to be critical for energy-efficient deep learning. Mixture-of-Experts (MoE), on the other side, aligns with the brain mechanism of distributed and sparse processing, resulting in an efficient way of enhancing model capacity and conditional computation. In this work, we consider how to incorporate SNNs' spike-driven and MoE's conditional computation into a unified framework. However, MoE uses softmax to get the dense conditional weights for each expert and TopK to hard-sparsify the network, which does not fit the properties of SNNs. To address this issue, we reformulate MoE in SNNs and introduce the Spiking Experts Mixture Mechanism (SEMM) from the perspective of sparse spiking activation. Both the experts and the router output spiking sequences, and their element-wise operation makes SEMM computation spike-driven and dynamic sparse-conditional. By developing SEMM into Spiking Transformer, the Experts Mixture Spiking Attention (EMSA) and the Experts Mixture Spiking Perceptron (EMSP) are proposed, which performs routing allocation for head-wise and channel-wise spiking experts, respectively. Experiments show that SEMM realizes sparse conditional computation and obtains a stable improvement on neuromorphic and static datasets with approximate computational overhead based on the Spiking Transformer baselines.

artificial intelligence, machine learning, proceedings, (8 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Neural Information Processing SystemsFeb-19-2026, 00:50:54 GMT

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (20 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.71)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 18:59:48 GMT

137101016144540ed3191dc2b02f09a5-Paper-Conference.pdf

computation, semm, spiking transformer, (16 more...)

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.71)
(2 more...)

Neural Information Processing SystemsMay-26-2025, 16:44:03 GMT

Spiking Transformer with Experts Mixture

artificial intelligence, machine learning, spiking transformer, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.80)

arXiv.org Artificial IntelligenceAug-19-2024

Toward Large-scale Spiking Neural Networks: A Comprehensive Survey and Future Directions

Hu, Yangfan, Zheng, Qian, Li, Guoqi, Tang, Huajin, Pan, Gang

Deep learning has revolutionized artificial intelligence (AI), achieving remarkable progress in fields such as computer vision, speech recognition, and natural language processing. Moreover, the recent success of large language models (LLMs) has fueled a surge in research on large-scale neural networks. However, the escalating demand for computing resources and energy consumption has prompted the search for energy-efficient alternatives. Inspired by the human brain, spiking neural networks (SNNs) promise energy-efficient computation with event-driven spikes. To provide future directions toward building energy-efficient large SNN models, we present a survey of existing methods for developing deep spiking neural networks, with a focus on emerging Spiking Transformers. Our main contributions are as follows: (1) an overview of learning methods for deep spiking neural networks, categorized by ANN-to-SNN conversion and direct training with surrogate gradients; (2) an overview of network architectures for deep spiking neural networks, categorized by deep convolutional neural networks (DCNNs) and Transformer architecture; and (3) a comprehensive comparison of state-of-the-art deep SNNs with a focus on emerging Spiking Transformers. We then further discuss and outline future directions toward large-scale SNNs.

artificial intelligence, machine learning, natural language, (17 more...)

2409.02111

Country:

North America > United States (0.14)
Asia > China > Zhejiang Province > Hangzhou (0.04)

Genre: Overview (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-25-2024

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Zhou, Chenlin, Zhang, Han, Zhou, Zhaokun, Yu, Liutao, Huang, Liwei, Fan, Xiaopeng, Yuan, Li, Ma, Zhengyu, Zhou, Huihui, Tian, Yonghong

Spiking Transformers, which integrate Spiking Neural Networks (SNNs) with Transformer architectures, have attracted significant attention due to their potential for energy efficiency and high performance. However, existing models in this domain still suffer from suboptimal performance. We introduce several innovations to improve the performance: i) We propose a novel spike-form Q-K attention mechanism, tailored for SNNs, which efficiently models the importance of token or channel dimensions through binary vectors with linear complexity. ii) We incorporate the hierarchical structure, which significantly benefits the performance of both the brain and artificial neural networks, into spiking transformers to obtain multi-scale spiking representation. iii) We design a versatile and powerful patch embedding module with a deformed shortcut specifically for spiking transformers. Together, we develop QKFormer, a hierarchical spiking transformer based on Q-K attention with direct training. QKFormer shows significantly superior performance over existing state-of-the-art SNN models on various mainstream datasets. Notably, with comparable size to Spikformer (66.34 M, 74.81%), QKFormer (64.96 M) achieves a groundbreaking top-1 accuracy of 85.65% on ImageNet-1k, substantially outperforming Spikformer by 10.84%. To our best knowledge, this is the first time that directly training SNNs have exceeded 85% accuracy on ImageNet-1K. The code and models are publicly available at https://github.com/zhouchenlin2096/QKFormer

complexity, qkformer, transformer, (16 more...)

2403.16552

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China > Heilongjiang Province > Harbin (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

arXiv.org Artificial IntelligenceJan-22-2024

TIM: An Efficient Temporal Interaction Module for Spiking Transformer

Shen, Sicheng, Zhao, Dongcheng, Shen, Guobin, Zeng, Yi

Spiking Neural Networks (SNNs), as the third generation of neural networks, have gained prominence for their biological plausibility and computational efficiency, especially in processing diverse datasets. The integration of attention mechanisms, inspired by advancements in neural network architectures, has led to the development of Spiking Transformers. These have shown promise in enhancing SNNs' capabilities, particularly in the realms of both static and neuromorphic datasets. Despite their progress, a discernible gap exists in these systems, specifically in the Spiking Self Attention (SSA) mechanism's effectiveness in leveraging the temporal processing potential of SNNs. To address this, we introduce the Temporal Interaction Module (TIM), a novel, convolution-based enhancement designed to augment the temporal data processing abilities within SNN architectures. TIM's integration into existing SNN frameworks is seamless and efficient, requiring minimal additional parameters while significantly boosting their temporal information handling capabilities. Through rigorous experimentation, TIM has demonstrated its effectiveness in exploiting temporal information, leading to state-of-the-art performance across various neuromorphic datasets.

dataset, information, neural network, (12 more...)

2401.11687

Country: Asia > Japan > Honshū > Kantō > Kanagawa Prefecture > Yokohama (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceDec-13-2023

AutoST: Training-free Neural Architecture Search for Spiking Transformers

Wang, Ziqing, Zhao, Qidong, Cui, Jinku, Liu, Xu, Xu, Dongkuan

Spiking Transformers have gained considerable attention because they achieve both the energy efficiency of Spiking Neural Networks (SNNs) and the high capacity of Transformers. However, the existing Spiking Transformer architectures, derived from Artificial Neural Networks (ANNs), exhibit a notable architectural gap, resulting in suboptimal performance compared to their ANN counterparts. Manually discovering optimal architectures is time-consuming. To address these limitations, we introduce AutoST, a training-free NAS method for Spiking Transformers, to rapidly identify high-performance Spiking Transformer architectures. Unlike existing training-free NAS methods, which struggle with the non-differentiability and high sparsity inherent in SNNs, we propose to utilize Floating-Point Operations (FLOPs) as a performance metric, which is independent of model computations and training dynamics, leading to a stronger correlation with performance. Our extensive experiments show that AutoST models outperform state-of-the-art manually or automatically designed SNN architectures on static and neuromorphic datasets. Full code, model, and data are released for reproduction.

architecture, autost, transformer, (15 more...)

2307.00293

Country:

North America > United States > North Carolina (0.04)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)